57 research outputs found

    Computation- and Space-Efficient Implementation of SSA

    Full text link
    The computational complexity of different steps of the basic SSA is discussed. It is shown that the use of the general-purpose "blackbox" routines (e.g. found in packages like LAPACK) leads to huge waste of time resources since the special Hankel structure of the trajectory matrix is not taken into account. We outline several state-of-the-art algorithms (for example, Lanczos-based truncated SVD) which can be modified to exploit the structure of the trajectory matrix. The key components here are hankel matrix-vector multiplication and hankelization operator. We show that both can be computed efficiently by the means of Fast Fourier Transform. The use of these methods yields the reduction of the worst-case computational complexity from O(N^3) to O(k N log(N)), where N is series length and k is the number of eigentriples desired.Comment: 27 pages, 8 figure

    Assessing the Significance of Peptide Spectrum Match Scores

    Get PDF
    Peptidic Natural Products (PNPs) are highly sought after bioactive compounds that include many antibiotic, antiviral and antitumor agents, immunosuppressors and toxins. Even though recent advancements in mass-spectrometry have led to the development of accurate sequencing methods for nonlinear (cyclic and branch-cyclic) peptides, requiring only picograms of input material, the identification of PNPs via a database search of mass spectra remains problematic. This holds particularly true when trying to evaluate the statistical significance of Peptide Spectrum Matches (PSM) especially when working with non-linear peptides that often contain non-standard amino acids, modifications and have an overall complex structure. In this paper we describe a new way of estimating the statistical significance of a PSM, defined by any peptide (including linear and non-linear), by using state-of-the-art Markov Chain Monte Carlo methods. In addition to the estimate itself our method also provides an uncertainty estimate in the form of confidence bounds, as well as an automatic simulation stopping rule that ensures that the sample size is sufficient to achieve the desired level of result accuracy

    Basic Singular Spectrum Analysis and Forecasting with R

    Full text link
    Singular Spectrum Analysis (SSA) as a tool for analysis and forecasting of time series is considered. The main features of the Rssa package, which implements the SSA algorithms and methodology in R, are described and examples of its use are presented. Analysis, forecasting and parameter estimation are demonstrated by means of case study with an accompanying code in R

    Multivariate and 2D Extensions of Singular Spectrum Analysis with the Rssa Package

    Get PDF
    Implementation of multivariate and 2D extensions of singular spectrum analysis (SSA) by means of the R package Rssa is considered. The extensions include MSSA for simultaneous analysis and forecasting of several time series and 2D-SSA for analysis of digital images. A new extension of 2D-SSA analysis called shaped 2D-SSA is introduced for analysis of images of arbitrary shape, not necessary rectangular. It is shown that implementation of shaped 2D-SSA can serve as a basis for implementation of MSSA and other generalizations. Efficient implementation of operations with Hankel and Hankel-block-Hankel matrices through the fast Fourier transform is suggested. Examples with code fragments in R, which explain the methodology and demonstrate the proper use of Rssa, are presented

    MetaGT : A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data

    Get PDF
    While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands. However, there are several problems that complicate metatranscriptome analysis: complexity of microbial communities, wide dynamic range of transcriptome expression and importantly, the lack of high-quality computational methods for assembling meta-RNA sequencing data. These factors deteriorate the contiguity and completeness of metatranscriptome assemblies, therefore affecting further downstream analysis. Here we present MetaGT, a pipeline for de novo assembly of metatranscriptomes, which is based on the idea of combining both metatranscriptomic and metagenomic data sequenced from the same sample. MetaGT assembles metatranscriptomic contigs and fills in missing regions based on their alignments to metagenome assembly. This approach allows to overcome described complexities and obtain complete RNA sequences, and additionally estimate their abundances. Using various publicly available real and simulated datasets, we demonstrate that MetaGT yields significant improvement in coverage and completeness of metatranscriptome assemblies compared to existing methods that do not exploit metagenomic data. The pipeline is implemented in NextFlow and is freely available fromhttps://github.com/ablab/metaGT.Peer reviewe

    A novel uncultured heterotrophic bacterial associate of the cyanobacterium Moorea producens JHB

    Get PDF
    Background Filamentous tropical marine cyanobacteria such as Moorea producens strain JHB possess a rich community of heterotrophic bacteria on their polysaccharide sheaths; however, these bacterial communities have not yet been adequately studied or characterized. Results and discussion Through efforts to sequence the genome of this cyanobacterial strain, the 5.99 MB genome of an unknown bacterium emerged from the metagenomic information, named here as Mor1. Analysis of its genome revealed that the bacterium is heterotrophic and belongs to the phylum Acidobacteria, subgroup 22; however, it is only 85 % identical to the nearest cultured representative. Comparative genomics further revealed that Mor1 has a large number of genes involved in transcriptional regulation, is completely devoid of transposases, is not able to synthesize the full complement of proteogenic amino acids and appears to lack genes for nitrate uptake. Mor1 was found to be present in lab cultures of M. producens collected from various locations, but not other cyanobacterial species. Diverse efforts failed to culture the bacterium separately from filaments of M. producens JHB. Additionally, a co-culturing experiment between M. producens JHB possessing Mor1 and cultures of other genera of cyanobacteria indicated that the bacterium was not transferable. Conclusion The data presented support a specific relationship between this novel uncultured bacterium and M. producens, however, verification of this proposed relationship cannot be done until the ?uncultured? bacterium can be cultured

    Improving Switch Lowering for The LLVM Compiler System

    No full text
    Switch-case statements (or switches) provide a natural way to express multiway branching control flow semantics. They are common in many applications including compilers, parsers, text processing programs, virtual machines. Various optimizations for switches has been studied for many years. This paper presents the description of switch lowering refactoring recently made for the LLVM Compiler System

    Solid-state fault current limiter for medium voltage distribution systems

    No full text
    This paper presents a thyristor-controlled fault current limiter for medium voltage distribution systems (6-10 kV). The main goal of the proposed scheme is to alleviate the effect of the fault current on the switchgear and other equipment. The limiter is designed mainly for application in industries with large motors installed, where motor\u27s feeding to the fault is significant. Model of the power system with the limiter has been developed using ATP-EMTF. Results of simulations, showing the efficiency of the limiter applications, are presented in the paper. © 2003 IEEE

    Singular spectrum analysis with R

    No full text
    This comprehensive and richly illustrated volume provides up-to-date material on Singular Spectrum Analysis (SSA). SSA is a well-known methodology for the analysis and forecasting of time series. Since quite recently, SSA is also being used to analyze digital images and other objects that are not necessarily of planar or rectangular form and may contain gaps. SSA is multi-purpose and naturally combines both model-free and parametric techniques, which makes it a very special and attractive methodology for solving a wide range of problems arising in diverse areas, most notably those associated with time series and digital images. An effective, comfortable and accessible implementation of SSA is provided by the R-package Rssa, which is available from CRAN and reviewed in this book. Written by prominent statisticians who have extensive experience with SSA, the book (a) presents the up-to-date SSA methodology, including multidimensional extensions, in language accessible to a large circle of users, (b) combines different versions of SSA into a single tool, (c) shows the diverse tasks that SSA can be used for, (d) formally describes the main SSA methods and algorithms, and (e) provides tutorials on the Rssa package and the use of SSA. The book offers a valuable resource for a very wide readership, including professional statisticians, specialists in signal and image processing, as well as specialists in numerous applied disciplines interested in using statistical methods for time series analysis, forecasting, signal and image processing. The book is written on a level accessible to a broad audience and includes a wealth of examples; hence it can also be used as a textbook for undergraduate and postgraduate courses on time series analysis and signal processing
    • …
    corecore